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Abstract — We derive a simple general parametric representa- 
tion of the rate-distortion function of a memoryless source, where 
both the rate and the distortion are given by integrals whose 
integrands include the minimum mean square error (MMSE) of 
the distortion A = d{X, Y) based on the source symbol X, 
with respect to a certain joint distribution of these two random 
variables. At first glance, these relations may seem somewhat 
similar to the I-MMSE relations due to Guo, Shamai and Verdu, 
but they are, in fact, quite different. The new relations among 
rate, distortion, and MMSE are discussed from several aspects, 
and more importantly, it is demonstrated that they can sometimes 
be rather useful for obtaining non-trivial upper and lower 
bounds on the rate-distortion function, as well as for determining 
the exact asymptotic behavior for very low and for very large 
distortion. Analogous MMSE relations hold for channel capacity 
as well. 

Index Terms — Rate-distortion function, Legendre transform, 
estimation, minimum mean square error. 



I. Introduction 

IT has been well known for many years that the derivation of 
the rate-distortion function of a given source and distortion 
measure, does not lend itself to closed form expressions, 
even in the memoryless case, except for a few very simple 
examples |[I|,|l2l,|[3l,||3. This has triggered the derivation of 
some upper and lower bounds, both for memoryless sources 
and for sources with memory. 

One of the most important lower bounds on the rate- 
distortion function, which is applicable for difference distor- 
tion measures (i.e., distortion functions that depend on their 
two arguments only through the difference between them), 
is the Shannon lower bound in its different forms, e.g., the 
discrete Shannon lower bound, the continuous Shannon lower 
bound, and the vector Shannon lower bound. This family of 
bounds is especially useful for semi-norm-based distortion 
measures |5, Section 4.8]. The Wyner-Ziv lower bound [14J 
for a source with memory is a convenient bound, which 
is based on the rate-distortion function of the memoryless 
source formed from the product measure pertaining to the 
single-letter marginal distribution of the original source and 
it may be combined elegantly with the Shannon lower bound. 
The autoregressive lower bound asserts that the rate-distortion 
function of an autoregressive source is lower bounded by the 
rate-distortion function of its innovation process, which is 
again, a memoryless source. 

Upper bounds are conceptually easier to derive, as they may 
result from the performance analysis of a concrete coding 



scheme, or from random coding with respect to (w.rt.) an 
arbitrary random coding distribution, etc. One well known 
example is the Gaussian upper bound, which upper bounds 
the rate-distortion function of an arbitrary memoryless (zero- 
mean) source w.r.t. the squared error distortion measure by the 
rate-distortion function of the Gaussian source with the same 
second moment. If the original source has memory, then the 
same principle generalizes with the corresponding Gaussian 
source having the same autocorrelation function as the original 
source H] Section 4.6]. 

In this paper, we focus on a simple general parametric 
representation of the rate-distortion function which seems to 
set the stage for the derivation of a rather wide family of 
both upper bounds and lower bounds on the rate-distortion 
function. In this parametric representation, both the rate and 
the distortion are given by integrals whose integrands include 
the minimum mean square error (MMSE) of the distortion 
based on the source symbol, with respect to a certain joint 
distribution of these two random variables. More concretely, 
given a memoryless source designated by a random variable 
(RV) X, governed by a probability functiorO]p(a:), a reproduc- 
tion variable Y, governed by a probability function q{y), and 
a distortion measure d{x, y), the rate and the distortion can be 
represented parametrically via a real parameter s G [0, oo) as 
follows: 

Ds = Do - f ds • mmses(A|X) 
Jo 

/oo 
ds • mmses(A|X) 



(1) 



and 



— / ds • s • mmses(A|X) 
Jo 

/oo 
ds-s-mmse^(A|X), (2) 



where Dg is the distortion pertaining to parameter value s, 
Rq{Ds) is the rate-distortion function w.r.t. reproduction dis- 
tribution q, computed at Dg, A = d{X, Y), and mmses(A|X) 
is the MMSE of estimating A based on X, where the joint 
probability function of (X, A) is induced by the following 
joint probability function of {X, Y): 



Ps{x, y) = p{x) ■ Ws{y\x) = p{x) 



Zx{s) 



(3) 
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where Zx{s) is a normalization constant, given by 
J Ayq{y)e~^'^'^^'y^ in the continuous case, or q(?/)e~'"^'^^'^) 
in the discrete case. 

At first glance, eq. (|2]i looks somewhat similar to the I- 
MMSE relation of [6|, which relates the mutual information 
between the input and the output of an additive white Gaussian 
noise (AWGN) channel and the MMSE of estimating the 
channel input based on the noisy channel output. As we 
discuss later on, however, eq. (|2]i is actually very different 
from the I-MMSE relation in many respects. In this context, 
it is important to emphasize that a relation analogous to (|2]l 
applies also to channel capacity, as will be discussed in the 
sequel. 

The relations ([T]l and (|2|l have actually already been raised in 
a companion paper 19] (see also 1 10 1 for a conference version). 
Their derivation there was triggered and inspired by certain 
analogies between the rate-distortion problem and statistical 
mechanics, which were the main theme of that work. However, 
the significance and the usefulness of these rate-distortion- 
MMSE relations were not explored in \9\ and fTOl. 

It is the purpose of the present work to study these re- 
lations more closely and to demonstrate their utility, which 
is, as said before, in deriving upper and lower bounds. The 
underlying idea is that bounds on Rq{D) (and sometimes 
also on R{D) = min^ Rq{D)) may be obtained via relatively 
simple bounds on the MMSE of A based on X. These bounds 
can either be simple technical bounds on the expression of 
the MMSE itself, or bounds that stem from pure estimation- 
theoretic considerations. For example, upper bounds may be 
derived by analyzing the MMSE of a certain sub-optimum 
estimator, e.g., a linear estimator, which is easy to analyze. 
Lower bounds can be taken from the available plethora of 
lower bounds offered by estimation theory, e.g., the Cramer- 
Rao lower bound. 

Indeed, an important part of this work is a section of 
examples, where it is demonstrated how to use the proposed 
relations and derive explicit bounds from them. In one of these 
examples, we derive two sets of upper and lower bounds, 
one for a certain range of low distortions and the other, for 
high distortion values. At both edge-points of the interval 
of distortion values of interest, the corresponding upper and 
lower bound asymptotically approach the limiting value with 
the same leading term, and so, they sandwich the exact 
asymptotic behavior of the rate-distortion function, both in 
the low distortion limit and in the high distortion limit. 

The outline of this paper is as follows. In Section II, we 
establish notation conventions. In Section III, we formally 
present the main result, prove it, and discuss its significance 
from the above-mentioned aspects. In Section IV, we provide 
a few examples that demonstrate the usefulness of the MMSE 
relations. Finally, in Section V, we summarize and conclude. 

II. Notation Conventions 

Throughout this paper, RV's will be denoted by capital 
letters, their sample values will be denoted by the respective 
lower case letters, and their alphabets will be denoted by the 
respective calligraphic letters. For example, X is a random 



variable, x is a specific realization of X, and X is the alphabet 
in which X and x take on values. This alphabet may be finite, 
countably infinite, or a continuum, like the real line El or an 
interval [a, &] C H. 

Sources and channels will be denoted generically by the 
letter p, or q, which will designate also their corresponding 
probability functions, i.e., a probability density function (pdf) 
in the continuous case, or a probability mass function (pmf) 
in the discrete case. Information-theoretic quantities, like 
entropies and mutual informations, will be denoted according 
to the usual conventions of the information theory literature, 
e.g., H{X), I{X;Y), and so on. If a RV is continuous- 
valued, then its differential entropy and conditional differential 
entropy will be denoted with h instead of H, i.e., h{X) 
is the conditional differential entropy of X, h{X\Y) is the 
conditional differential entropy of X given Y, and so on. The 
expectation operator will be denoted, as usual, by E{-}. 

Given a source RV X, governed by a probability func- 
tion p{x), X ^ X, a reproduction RV Y, governed by a 
probability function q{y), y ^ y, and a distortion measure 
d : X X y ^ H^, we define the rate-distortion function of 
X w.r.t. distortion measure d and reproduction distribution q 
as 



Rq{D) = inmI{X;Y), 



(4) 



where X ^ p and the minimum is across all channels 
{w{y\x), X € X, y y} that satisfy E{d{X,Y)) < D 
and E{w{y\X)} = q{y) for all y € y. Cleai'ly, the rate- 
distortion function, R{D), is given by R{D) — inig Rq{D). 
We will also use the notation A = d{X,Y). Obviously, since 
X and Y are RV's, then so is A. 

III. MMSE Relations: Basic Result and Discussion 

Throughout this section, our definitions will assume that 
both X and y are finite alphabets. Extensions to continuous 
alphabets will be obtained by a limit of fine quantizations, 
with summations eventually being replaced by integrations. 

Referring to the notation defined in Section HI] for a given 
positive real s, define the conditional probability function 



Wsiy\x) 



Zx{s) 



where 



and the joint pmf 



Psix^y) = p{x)ws{y\x). 



(5) 



(6) 



(7) 



Further, let 

mmses(A|X) 



E,{[A-E{A\X}n 

Es{[d{X,Y) ~ Es{d{X,Y)\X}]^](8) 



where Es{-} is the expectation operator w.r.t. {ps{x, y)}, and 
defining 4'{x) as the conditional expectation Eg{d{x, Y)\X = 
x} w.r.t. {wsiy\x)}, Es{diX,Y)\X} is defined as 

Our main result, in this section, is the following (the proof 
appears in the Appendix): 
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Theorem 1: The function Rq{D) can be represented para- 
metrically via the parameter s E [0,oo) as follows: 
(a) The distortion is obtained by 

Ds = Do - [ ds ■ mmses{A\X) 



where 



and 



/•OO 

Doo+Z ds •mmses(A|X) (9) 

J S 



Dq = '^p{x)q{y)d{x,y) 



Doo = p{x) m\nd{x,y). 



(10) 



(11) 



(b) The rate is given by 

Rq{Ds) 

= I ds • s • mmses(A|X) 

/OO 
ds • s • mmses(A|X). (12) 

In the remaining part of this section, we discuss the 
significance and the implications of Theorem 1 from several 
aspects. 

Some General Technical Comments 

The parameter s has the geometric meaning of the negative 
local slope of the function Rq{D). This is easily seen by 
taking the derivatives of (|9]l and (fT2] l. i.e., dRq{Ds)/ds = 
s ■ mmses(A|X) and dDs/ds = — mmses(A|X), whose ratio 
is R'g{Ds) — —s. This means also that the parameter s plays 
the same role as in the well known parametric representa- 
tions of 111] and 13], which is to say that it can also be 
thought of as the Lagrange multiplier of the minimization 
of [I{X;Y) + sE{d{X,Y)}] subject to the reproduction 
distribution constraint. 

On a related note, we point out that Theorem [T| is based on 
the following representation of Rq{D): 



Rq{D) 



mm 

s>0 



sD 



p{x) InZ^(s) 



(13) 



which we prove in the Appendix as the first step in the proof 
of Theorem 1 . 

It should be emphasized that the pmf q, that plays a role 
in the definition of Ws{y\x) (and hence also the definition of 
mmses(A|X)) should be k^^i fixed throughout the integration, 
independently of the integration variable s, since it is the 
same pmf as in the definition of Rq{D). Thus, even if q is 
known to be optimum for a given target distortion D (and 
then it yields R{D)), the pmf q must be kept unaltered 
throughout the integration, in spite of the fact that for other 
values of s (which correspond to other distortion levels), the 
optimum reproduction pmf might be different. In particular, 
note that the marginal of Y, that is induced from the joint pmf 



Ps{x,y), may not necessarily agree with q. Thus, ps{x,y) 
should only be considered as an auxiliary joint distribution 
that defines mmses(A|X). 

Using Theorem 1 for Bounds on Rq{D) 

As was briefly explained in the Introduction (and will also 
be demonstrated in the next section). Theorem [T| may set 
the stage for the derivation of upper and lower bounds to 
Rq{D) for a general reproduction distribution q, and hence 
also for the rate-distortion function R{D) when the optimum 
q is happened to be known or is easily derivable (e.g., from 
symmetry and convexity considerations). 

The basic underlying idea is that bounds on Rq (D) may be 
induced from bounds on mmses(A|X) across the integration 
interval. The bounds on the MMSE may either be derived from 
purely technical considerations, upon analyzing the expression 
of the MMSE directly, or by using estimation-theoretic tools. 
In the latter case, lower bounds may be obtained from funda- 
mental lower bounds to the MMSE, like the Bayesian Cramer- 
Rao bound, or more advanced lower bounds available from the 
estimation theory literature, for example, the Weiss- Weinstein 
bound |12],['T3l, whenever applicable. Upper bounds may be 
obtained by analyzing the mean square error (MSE) of a 
specific (sub-optimum) estimator, which is relatively easy to 
analyze, or more generally by analyzing the performance of 
the best estimator within a certain limited class of estimators, 
like the class of linear estimators of the 'observation' X, or a 
certain fixed function of X. 

In Theorem 1 we have deliberately presented two integral 
forms for both the rate and the distortion. As Dg is 
monotonically decreasing and Rq{Ds) is monotonically 
increasing in s, the integrals at the first lines of both eqs. ^ 
and (fT2] |. which include relatively small values of s, naturally 
lend themselves to derivation of bounds in the low-rate 
(high distortion) regime, whereas the second lines of these 
equations are more suitable in low-distortion (high resolution) 
region. For example, to derive an upper bound on Rq{D) in 
the high-distortion range, one would need a lower bound on 
mmses(A|X) to be used in the first line of (|9]l and an upper 
bound on mmses(A|X) to be substituted into the first line of 
(fT2] i. If one can then derive, from the former, an upper bound 
on s as a function of D, and substitute it into the upper bound 
on the rate in terms on s, then this will result in an upper 
bound to Rq{D). A similar kind of reasoning is applicable 
to the derivation of other types of bounds. This point will be 
demonstrated mainly in Examples C and D in the next section. 

Comparison to the I-MMSE Relations 

In the more conceptual level, item (b) of Theorem 1 may 
remind the familiar reader about well-known results due to 
Guo, Shamai and Verdu [61, which are referred to as I-MMSE 
relations (as well as later works that generalize these relations). 
The similarity between eq. (fT2l i and the I-MMSE relation (in 
its basic form) is that in both cases a mutual information 
is expressed as an integral whose integrand includes the 
MMSE of a certain random variable (or vector) given some 
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observation(s). However, to the best of our judgment, this is 
the only similarity. 

In order to sharpen the comparison between the two rela- 
tions, it is instructive to look at the special case where all 
random variables are Gaussian and the distortion measure 
is quadratic: In the context of Theorem 1, consider Y to 
be a zero-mean Gaussian RV with variance ct^, and let 
d{x,y) — {x — y)^. As will be seen in Example B of the 
next section, this then means that Ws{y\x) can be described 
by the additive Gaussian channel Y = aX + Z, where 
a = 2sCT^/(l + 2say) and Z is a zero-mean Gaussian RV, 
independent of X, and with variance o-^/(l + 2s(T^). Here, we 
have A = {Y - Xf = [Z - {I- a)X]^. Thus, the integrand 
of ([T2]i includes the MMSE in estimating [Z - (1 - a)X]'^ 
based on the channel input X. It is therefore about estimating 
a certain function of Z and X, where X is the observation at 
hand and Z is independent of X. 

This is very different from the paradigm of the I-MMSE 
relation: there the channel is y = ^/snrX + Z, where Z 
is standard normal, the integration variable is snr, and the 
estimated RV is X (or equivalently, Z) based on the channel 
output, Y . Also, by comparing the two channels, it is readily 
seen that the integration variable s, in our setting, can be 
related to the integration variable, snr, of the 1-MMSE relation 
according to 

and so, the relation between the two integration variables is 
highly non-linear. We therefore observe that the two MMSE 
results are fairly different. 

Analogous MMSE Formula for Channel Capacity 

Eq. (fT3] i can be understood conveniently as an achievable 
rate using a simple random coding argument (see Appendix): 
The coding rate R should be (slightly larger than) the 
large deviations rate function of the probability of the event 

{Yli^i'^i^i^^i) - ^*^}' where {xi,...,Xn) is a typical 
source sequence and (Yi,...,y„) are drawn i.i.d. from q. 
As is well known, a similar random coding argument applies 
to channel coding (see also |8|): Channel capacity can be 
obtained as the large deviations rate function of the event 
d{Xi,yi) < nD}, where now (j/i, . . . , ?/„) is a channel 
output sequence typical to q, {Xi, . . . , Xn) are drawn i.i.d. 
according to a given input pmf {p{x)}, the distortion measure 
is chosen to be d{x,y) = —lnw{y\x) ({w{y\x)} being the 
channel transition probabilities) and D = H{Y\X). Thus, the 
analogue of ( flj] ) is 



corresponding integrated MMSE formula would read 



Cr) 



s>0 



sH{Y\X) + Y,9iy)^^Zy{a 



where 



(15) 



(16) 



Co 



ds-s-mmses[lnp(r|X)|y] 



(17) 



where mmses[lnp(y|X)|y] is defined w.rt. the joint pmf 



q(^y) . P(^)f}y\^) , (18) 



and the minimizing s is always s* = 1. Consequently, the 



Zy{s) 

Eq. (fTTI i seems to be less useful than the analogous rate- 
distortion formulas, for a very simple reason: Since the 
channel is given, then once the input pmf p is given too 
(which is required for the use of (fTTjl), one can simply 
compute the mutual information, which is easier than 
applying ( fTTl ). This is different from the situation in the 
rate-distortion problem, where even if both p and q are given, 
in order to compute Rq{D) in the direct way, one still needs 
to minimize the mutual information w.rt. the channel between 
X and Y . Eq. ( [TtI i is therefore presented here merely for the 
purpose of drawing the duality. 

Analogies With Statistical Mechanics 

As was shown in 111] and further advocated in fSl, the 
Legendre relation (fljT l has a natural statistical-mechanical in- 
terpretation, where Zx{s) plays the role of a partition function 
of a system (indexed by x), d{x, y) is an energy function 
(Hamiltonian) and s plays the role of inverse temperature 
(normally denoted by /3 in the Physics literature). The mini- 
mizing s is then the equilibrium inverse temperature when \X\ 
systems (each indexed by x, with n{x) — np{x) particles and 
Hamiltonian £x{y) — d(x, y)) are brought into thermal contact 
and a total energy of nD is split among them. In this case, 
—Rq{D) is the thermodynamical entropy of the combined 
system and the MMSE, which is dDs/ds, is intimately related 
to the heat capacity of the system. 

An alternative, though similar, interpretation was given in 
ll9l. llT0l . where the parameter s was interpreted as being 
proportional to a generalized force acting on the system (e.g., 
pressure or magnetic field), and the distortion variable is the 
conjugate physical quantity influenced by this force (e.g., 
volume in the case of pressure, or magnetization in the case 
of a magnetic field). In this case, the minimizing s means 
the equal force that each one of the various subsystems is 
applying on the others when they are brought into contact and 
they equilibrate (e.g., equal pressures between two volumes of 
a gas separated by piston which is free to move). In this case, 
~Rq{D) is interpreted as the free energy of the system, and 
the MMSE formulas are intimately related to the fluctuation- 
dissipation theorem in statistical mechanics. 

More concretely, it was shown in |9| that given a source 
distribution and a distortion measure, we can describe (at least 
conceptually) a concrete physical system that emulates the 
rate-distortion problem in the following manner: When no 
force is applied to the system, its total length is uDq, where 
n is the number of particles in the system (and also the block 
length in the rate-distortion problem), and Dq is as defined 
above. If one applies to the system a contracting force, that 
increases from zero to some final value A, such that the length 
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of the system shrinks to nD, where D < Dq is analogous to 
a prescribed distortion level, then the following two facts hold 
true: (i) An achievable lower bound on the total amount of 
mechanical work that must be carried out by the contracting 
force in order to shrink the system to length nD, is given by 



W > nkTRg{D), 



(19) 



where k is Boltzmann's constant and T is the temperature, (ii) 
The final force A is related to D according to A = kTR'^{D), 
where R'q{-) is the derivative of Rq{-)- Thus, the rate- 
distortion function plays the role of a fundamental limit, not 
only in Information Theory, but in Physics as well. 



and 



RiD) 



(1 



= In 2 



In 2 



- e 



1 - 
h2 



- ln(l + e") 



1 



where h2{u) 
function. 



= ln2-h2{D), (24) 
-M In w— (1 — w) ln(l — u) is the binary entropy 



IV. Examples 

In this section, we provide a few examples for the use 
of Theorem 1. The first two examples are simple and well 
known, and their purpose is just to demonstrate how to use 
this theorem in order to calculate rate-distortion functions. 
The third example is aimed to demonstrate how Theorem 
1 can be useful as a new method to evaluate the behavior 
of a certain rate-distortion function (which is apparently not 
straightforward to derive otherwise) at both the low distortion 
(a.k.a. high resolution) regime and the high distortion regime. 
Specifically, we first derive, for this example, upper and lower 
bounds on R{D), which are applicable in certain ranges 
of high-distortion. These bounds have the same asymptotic 
behavior as D tends to its maximum possible value, and so, 
they sandwich the exact high-distortion asymptotic behavior 
of the true rate-distortion function. A similar analysis in 
then carried out in the low distortion range, and again, the 
two bounds have the same limiting behavior in the very low 
distortion limit. In the fourth and last example, we show how 
Theorem 1 can easily be used to evaluate the high-resolution 
behavior of the rate distortion function for a general power- 
law distortion measure of the form d{x, y) = \x — y\^. 

A. Binary Symmetric Source and Hamming Distortion 

Perhaps the simplest example is that of the binary symmetric 
source (BSS) and the Hamming distortion measure. In this 
case, the optimum q is also symmetric. Here A — d{X, Y) is 
a binary RV with 



Pr{A = 1\X -x} 



1 



(20) 



independently of x. Thus, the MMSE estimator of d{X, Y) 
based on X is 

1 + e ' 

regardless of X, and so the resulting MMSE (which is simply 
the variance in this case) is easily found to be 



mmse5(A|X) = — - 



(l + e--)2- 



Accordingly, 



D 



Ms 



(1 



-s\2 



(22) 



(23) 



B. Quadratic distortion and Gaussian Reproduction 

Another classic example concerns a general source with 
(T^ — E{X^} < oo, the quadratic distortion d{x, y) = 
{x — y)^, and a Gaussian reproduction distribution, namely, 
q{y) is the pdf of a zero-mean Gaussian RV with variance 
(T^ = (T^ — D, for a given D < a^. In this case, it well 

known that Rq{D) = ^ In ^ (even without assuming that the 
source X is Gaussian). We now demonstrate how this result 
is obtained from the MMSE formula of Theorem 1 

First, observe that since q{y) is the pdf pertaining to 
7V(0, al - D), then 



Ws{y\x) 



!:2dv'q{y')e-^(y'-y 



(25) 



is easily found to correspond to the Gaussian additive channel 

2s{al-D) 



Y = 



1 + 2s{al - D] 



■X 



(26) 



where Z is, a. zero-mean Gaussian RV with variance al = 
{al - D)/[l + 2s{al - D)], and Z is uncorrelated with X. 
Now, 



A = {Y~Xf 

2s{al - D) 



Y - 



1 + 2s{al - D) 
= {Z-aXf 
= Z'^ - 2aXZ + c?X'^ 



■X- 



X 



1 + 2s{al - D) 



(27) 



where a = 1/[1 + 2s{al - D)]. Thus, the MMSE estimator 
of A given X is obtained by 

A = E{A\X} 

= E{Z^\X}-2aXE{Z\X} + a^X^ 

= E{Z^} -2aXE{Z} + a^X^ 

= E{Z^} + a^X^ 

= al+a^X^, (28) 



^We are not arguing here that this is the simplest way to calculate Rq(D) 
in this example, the purpose is merely to demonstrate how Theorem 1 can be 
used. 
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which yields 



mmses{A|X} 



2aXZ -a^X^ f] 



(29) 



[l + 2sial-DW [1 + 2.(^2 „^)]3- 

Now, in our case, Dq = ct^ + cr^ = 2(t^ — D, and so, for 
s = 1/(2D), we get 



Ds = Do-f ds • mmse5(A|X) 
Jo 



2<7i 



2(a^ - D)' 
2<jI~D + 



1/2D 



ds 



[l + 2s{al~DW 
1/2^ ds 







[1 + 2s{al - D)Y 



1/2D 



1 + 2s{al - D) 



[l + 2s{al^DW 



1/2D 



(30) 
D. I.e., 



which, after some straightforward algebra, gives D 
s and D are indeed related by s = 1/{2D), or D ^ l/(2s) 
Finally, 



Rq{D) = [ ds • s • minse5(A|X) 
Jo 



2(a^ - D) 



1/2D 



sds 



[l + 2s{al-DW 
1/2^ Ids 







[l + 2s{al-D)f 



1 



{ln[l + 2s(a2_^)]_ 



1 



1 + 2s(a2 - i^) 



D 



1/2D 



1 



2[l + 2s(a2-i?)]2 

-1 1/2D 



1 + 2s(a2 - D) 



(31) 



which yields, after a simple algebraic manipulation, i?^ [D] = 



2 D ■ 



C. Quadratic Distortion and Binary Reproduction 

In this example, we again assume the quadratic distortion 
measure, but now, instead of Gaussian reproduction code- 
words, we impose binary reproduction, y £ {—a, +a}, where 
a is a given constant]^ Clearly, if the pdf of the source X is 
symmetric about the origin, then the best output distribution 

^The derivation, in this example, can be extended to apply also to larger 
finite reproduction alphabets. 



is also symmetric, i.e., q{+a) — q{—a) = 1/2. Thus, 
Rq{D) = R{D) for every D, given this choice of q. The 
channel Ws{y\x) is now given by 



Ws iy\x) 



-s(y-x) 



„2sxy 



^-s(x-af _|_ g-s(i:+a)2 2 COsh(2aSx) ' 



(32) 



Note that in this case, the minimum possible distortion (ob- 
tained for s — oo) is given by Doo = E{[X — asgn(X)]2}. 
Thus, the rate-distortion function is actually defined only for 
D > Doc- The maximum distortion of interest is Do ~ 
(T^ + a^, pertaining to the choice s = 0, where X and Y are 
independent. To the best of our knowledge, there is no closed 
form expression for R{D) in this example. The parametric 
representation of Dg and R{Ds), both as functions of s, does 
not seem to lend itself to an explicit formula of R{D). The 
reason is that 



EiiY-Xf} 



= al + a^ - 2E{XY} 

= (jl + a^~ 2E{X ■ E{Y\X}} 

= al + a^ ~ 2aE{X tanh(2asX)} (33) 

and there is no apparent closed-form expression of s a function 
of D, which can be substituted into the expression of R{Ds)- 
Consider the MMSE estimator of A = (F - Xf = X"^ + 



2XY: 



A = E{lY^Xf\X} 

= X"^ -2XE{Y\X} 



2aXi'An\\{2asX). 



(34) 



The MMSE is then 

mmses(A|X) 



E{ [2X{Y - a tanh(2asX))]2} 
Aa^ial - E{X^tanh^{2asX)}].(35) 



We first use this expression to obtain upper and lower bounds 
on R{D) which are asymptotically exact in the range of high 
distortion levels (small s). Subsequently, we do the same for 
the range of low distortion (large s). 

High Distortion. Consider first the high distortion regime. For 
small s, we can safely upper bound tanh^(2asX) by {2asX)'^ 
and get 



mmses(A|X) > Aa'^iai - Aa^s^EiX''}) 



(36) 



where p'^ — E{X^}. This results in the following lower bound 

to R{D,): 

R{Ds) = / ds • s • mmse^(A|X) 

> [ ds • s[4aV^ ~ 16aV;^s2] 

^0 



(37) 



To get a lower bound to Dg, we need an upper bound to the 
MMSE. An obvious upper bound (which is tight for small s) 
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is given by Aa?a^, which yields: 

Dg = Do - [ ds • mmse^(A|X) 

> Da - [ ds- (4a V^) 
Jo 



^ Dn- Aa'ais 



or 



s > 



Dq-D, 



(38) 



(39) 



Consider now the range s £ [0, CT2;/(2ap^)], which is the range 
where r(s) is monotonically increasing as a function of s. In 
this range, a lower bound on s would yield a lower bound 
on r{s), and hence a lower bound to R{Ds). Specifically, for 

s £ [0,a^/{2apl)], we get 



R{Ds) > r{s) 



> 



{Do - Ds 



pi{Do-Dsf 



In other words, we obtain the lower bound 

for the range of distortions D E [Dq — 2a<T^ / , Dq] . It is 
obvious that, at least in some range of high distortion levels, 
this bound is better than the Shannon lower bound. 



Rs{D)^h{X)-^ln{2iTeD), 



(42) 



where h{X) is the differential entropy of X. This can be 
seen right away from the fact that Rs{D) vanishes at D = 
(27re)-ie2''(^) < a^, whereas the bound Rl{D) of gD 
vanishes at Do = cr^ + a^, which is strictly larger. 

By applying the above-mentioned upper bound to the 
MMSE in the rate equation, and the lower bound to the MMSE 
- in the distortion equation, we can also get an upper bound 
to R{D) in the high-distortion range, in a similar manner. 
Specifically, 



R{Ds) < I ds- s{Aa^al) = 2aV^s2 



and 



Ds < Do - [ dsiAa^al - Wa'^p^.f^ 
Jo 



16 



= Do-Aa'ais + ^a^pis' 



(44) 



Considering again the range s e [0,ax/{2ap'^)], where d{s) 
is monotonically decreasing, the inverse function S^^{D) is 
monotonically decreasing as well, and so an upper bound on 
R{D) will be obtained by substituting S^^{D) instead of s 
in the bound on the rate, i.e., R{D) < 2a'^al[6^^{D)]'^ . 
To obtain an explicit expression for 5~^(_D), we need to 
solve a cubic equation in s and select the relevant solution 
among the three. Fortunately, since this cubic equation has 
no quadratic term, the expression of the solution can be found 



trigonometrically and it is relatively simple (see, e.g., Q p. 9]): 
Specifically, the cubic equation s"^ + As + B = has solutions 
of the form s = tocos 6*, where m — 2y/ —A/3 and 9 is any 
solution to the equation cos(36') = In other words, the 
three solutions to the above cubic equation are Si — m cos 6i, 
where 

27r(i - 1) 



1 _i / 3i? \ 
= - cos — — 
3 yAm J 



z = l,2,3, (45) 



with cos ^{t) being defined as the unique solution to the 
equation cos a — t in the range a £ [0, vr]. In our case, 

3a.? 



A 



B 



i{Do - D) 



(46) 



and so, the relevant solution for s (i.e., the one that tends to 
zero as D — > Do), which is 5^^{D), is given by 



6-\D) 



(40) = 



COS 



cos 



CTx 



-cos 



1 fn 

3 2+"" 



3pUD - Do) 



_i ^i pl{Do-D) 



47r 

Y 



An 

T 



■ sm 



1 



■ sm 



ipliPo - D) 
Aaal 



(47) 



where sin^^(t) is defined as the unique solution to the 
equation sin a — t m the range a G [— 7r/2, 7r/2]. This yields 
the upper bound 



R{D) 



< 



2at 



■ sm 



1 



3pUDo-D) 
Aaal 



Ru{D). 



(48) 



for the range of distortions D E [Dq — Aaa'^/{3p'l), Do]- 

For very small s, since the upper and the lower bound to 
the MMSE asymptotically coincide (namely, mmses(A|X) w 
ia^al), then both Ru{D) and Rl{D) exhibit the same 
behavior near D — Do, and hence so does the true rate- 
distortion function, R{D), which is 

{Do - Df 



R{D) 



(43) or, stated more rigorously. 



lim 



R{D) 



1 



(49) 



(50) 



DWo {Do - D)-^ 8a^a^ ' 

Note that the high-distortion behavior of R{D) depends on 
the pdf of X only via its second order moment ct^. On the 
other hand, the upper and lower bounds, Rij{D) and Rl{D), 
depend only on ct^ and the fourth order moment, p'^. 

In Fig.[T] we display the upper bound Rjj{D) (solid curve) 
and the lower bound Rl{D) (dashed curve) for the choice 
a"^ = — 1 (hence Dq = + ol^ = 2) and p^ = 3, which 
is suitable for the Gaussian source. The range of displayed 
distortions, [1.25,2], is part of the range where both bounds 
are valid in this numerical example. As can be seen, the 
functions Rl{D) and Ru{D) are very close throughout the 
interval [1.7,2], which is a fairly wide range of distortion 
levels. The corresponding Shannon lower bound, in this case. 
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which is Rs{D) = max{0, iln-^}, vanishes for all > 1 
and hence also in the range displayed in the graph. 



0.14r 



0.12 - 



0.1 - 



0.08- 



0.06- 



0.04- 



0.02- 




Fig. 1. The upper bound Rij(D) (solid curve) and the lower bound Rl{D) 
(dashed curve) in the high-distortion regime for a'^ = a? = 1 and = 3. 
The Shannon lower bound vanishes in this distortion range. 

Low Distortion. We now consider the small distortion regime, 
where s is very large. Define the function 



/(«) 



1 - u 
l + u 



e [0,1) 



(51) 



and consider the Taylor series expansion of f{u) around u = 
0, which, for the sake of convenience, will be represented as 



(52) 



n=l 



The coefficients {<i)n} will be determined explicitly in the 
sequel. Now, clearly, tanh^(2asa;) = /(e"''"''''^'), and so we 
have 

mmses(A|X) 



= 4a" 



a^-£;{XV(exp{-4as|X|})}] 



-Aan8\X\ 



(53) 



n=l 



To continue from this point, we will have to let X assume 
a certain pdf. For convenience, let us select X to have the 
Laplacian pdf with parameter 6, i.e.. 



(54) 



We then obtain 

mmse(j(A|X) 



^+00 



= 2a"6i 



00 , 

E TTT^^- (55) 



n=l 



AansY 



Thus, 



POO 

= R{D^)- / ds • s • mmses(A|X) 

J s 

ds ■ s 



1 - 8a^e E 0" 

71=1 

9 2^ «2 



(6* + 4ans)^ 



2 ^ n/ 

n—1 



1 



+ Aans 2(0 + Aansy 



■ (56) 



Thus far, our derivation has been exact. We now make an 
approximation that applies for large s by neglecting the terms 
proportional to {0 + 4ans)~^ and by neglecting 9 compared 
to Aans in the denominators of l/{9 + Aans). This results in 
the approximation 



(57) 



Let us denote C = £ J2n=i 1^- Then, R{Ds) = 1 - C/s. 
Applying a similar calculation to Dg = + ds ■ 
rmnse/js(A|X), yields, in a similar manner, the approximation 



D, « D, 



(58) 



It is easy now to express s as a function of D and substitute 
into the rate equation to obtain 



R{D) « 1 - V2C(£>-£>oo). 



(59) 



Finally, it remains to determine the coefficients {(f)n} and then 
the constant C. The coefficients can easily be obtained by 
using the identity {1 + u)-^ ^ Er=o("l)""" (" ^ 1))' 
which yields, after simple algebra, = 4n(— 1)"+^. Thus, 



2a ^ 



9 ^ (-1)"+! 



24a" 



(60) 



and we have obtained a precise characterization of R{D) in 
the high-resolution regime: 



lim 



1 - R{D) 



= V2C = - • W— . 

2 V 3a 



(61) 



By applying a somewhat more refined analysis, one obtains 
(similarly as in the above derivation in the high distortion 
regime) upper and lower bounds to R{Ds) and Dg, this time, 
as polynomials in 1/s. These again lend themselves to the 
derivation of upper and lower bounds on R{D), which are 
appUcable in certain intervals of low distortion. Specifically, 
the resulting upper bound is 

R{D) < 1 - ^/2C{D - £)oo) + Ci{D- D^), (62) 
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-1)" 



-, and it is valid in the range and so 



where Ci = i^Er=i^ 
D G [Doo,£^oo + C/(2C^)]. The obtained lower bound is 

R{D) > 1 



2 cos 



3 sin 1 



2Ci 



c 



(63) 



and it apphes to the range D e [Doo, -Doo + C/(12Ci )]. Both 
bounds have the same leading term in asymptotic behavior, 
which supports eq. dMT l. The details of this derivation are 
omitted since they are very similar to those of the high- 
distortion analysis. 

D. High Resolution for a General U Distortion Measure 

Consider the case where the distortion measure is given by 
the L*" metric, d{x, y) = \x — yY for some fixed r > 0. Let 
the reproduction symbols be selected independently at random 
according to the uniform pdf 

elsewhere 



(64) 



Then 



and so 



Ws{y\x) 



g-s|i/'-x|'- 



(65) 



D, = 



+ 00 



dxp{x) 



y\'e 



s\y-xY' 



r+A 



+ C30 



Axpix) ■ — In 
as 



dy ■ e- 

+A 



-s\y-x\'- 



dy ■ 



(66) 



Now, in the high-resolution limit, where s is very large, the 
integrand e^^'^^^l'^ decays very rapidly as y takes values away 
from X, and so, for every x £ {—A, -\-A) (which for large 
enough A, is the dominant interval for the outer integral over 
p{x)dx), the boundaries, —A and +A, of the inner integral 
can be extended to — oo and +oo within a negligible error 
term (whose derivative w.rt. s is negligible too). Having done 
this, the inner integral no longer depends on x, which also 
means that the outer integration over x becomes superfluous. 
This results in 



^ 1 
-—In 

OS 

^ 1 
-TP In 

OS 

^ 1 
-—In 

OS 



dy ■ e" 



+ 00 



d{s'l^y)e- 



+ 00 



dt 



d 



ln(s-i/'') 



1 

rs 



Thus, 



which yields 



dD, 



1 



mmses(A|X) -—— = — - 



ds 



dRqjDs) 
ds 



= s • mmses(A|X) = — 

rs 



(67) 



(68) 



(69) 



K +-\s\s 
r 

r \ rD, 



K 



(70) 



where K is an integration constant. We have therefore obtained 
that in the high-resolution limit, the rate-distortion function 
w.r.t. q behaves according to 



R,{D) 



K' --\nD. 

r 



(71) 



with K' = K — (In r) jr. While this simple derivation does 
not determine yet the constant K' , it does provide the correct 
characteristics of the dependence of Rq (D) upon D for small 
D. For the case of quadratic distortion, where r = 2, one 
easily identifies the familiar factor of 1/2 in front of the log- 
distortion term. 

The exact constant K (or K') can be determined by 
returning to the original expression of Rq{D) as the Legendre 
transform of the log-moment generating function of the distor- 
tion (eq. (fTsT l. and setting there s — l/{rD) as the minimizing 
s for the given D. The resulting expression turns out to be 



K' = In 



rA 



ini/r) 



1 



In(er). 



(72) 



V. Conclusion 



In this paper, we derived relations between the rate- 
distortion function Rq{D) and the MMSE in estimating the 
distortion given the source symbol. These relations have been 
discussed from several aspects, and it was demonstrated how 
they can be used to obtain upper and lower bounds on Rq{D), 
as well as the exact asymptotic behavior in very high and very 
low distortion. 

The bounds derived in our examples were induced from 
purely mathematical bounds on the expression of the MMSE 
directly. We have not explored, however, examples of bounds 
on Rq{D) that stem from estimation-theoretic bounds on 
the MMSE, as was described in Section III. In future work, 
it would be interesting to explore the usefulness of such 
bounds as well. Another interesting direction for further work 
would be to make an attempt to extend our results to rate- 
distortion functions pertaining to more involved settings, such 
as successive refinement coding, and situations that include 
side information. 

Appendix 



Proof of Theorem 1. 

Consider a random selection of a codebook of M 
codewords, where the various codewords are drawn indepen- 
dently, and each codeword, Y = (Yi,...,K„), is drawn 
according to the product measure Q{y) = YVi=i liVi)- Let 
X = (xi, . . . , x„) be a typical source vector, i.e., the number 
of times each symbol x G X appears in x is (very close 
to) np{x). We now ask what is the probability of the event 
— nD}! As this is a large deviations event 
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whenever D < J2x yPi^)l(y)'^(^Ty)' '■^^^ probability must 
decay exponentially with some rate function Iq{D) > 0, i.e., 

/„(!?)= lim --InPri Vd(x,,y,) < ■ (73) 

n-s-cx) n ^ — ' 

The function Iq{D) can be determined in two ways. The first 
is by the method of types jS), which easily yields 



Iq{D) ^ mm[IiX;Y') + D{q'\\q)], 



(74) 



where the Y' is an auxiliary random variable governed 
by — J2xex Pi^)^(y\^) '^he minimum is over 

all conditional pmf's {w{y\x)} that satisfy the inequality 

J2x£xPi^)J2yey'^iy\^)^i^'y) - ^- The second method 
is based on large deviations theory l?) (see also [8|), which 
yields 



mm 

s>0 



sD 



xex 



p{x) \tlZx{s) 



(75) 



We first argue that Iq{D) = Rq{D). The inequahty Iq{D) < 
Rq{D) is obvious, as Rq{D) is obtained by confining the 
minimization over the channels in ( l74l l so as to comply with 
the additional constraint that Ylxex Pi^)^iy\^) — liv) 
all y y. The reversed inequality, Iq{D) > Rq{D), is 
obtained by the following coding argument; On the one hand, 
a trivial extension of the converse to the rate-distortion coding 
theorem [2, p. 317], shows that Rq{D) is a lower bound 
to the rate-distortion performance of any code that satisfies 
I ELi Pr{^» = 2/} = for all y e On the other 
hand, we next show that Iq{D) is an achievable rate for codes 
in this class. 

Consider the the random coding mechanism described in the 
first paragraph of this proof, with R — Iq{D) + e, with e > 
being arbitrarily small. Since the probability that for a single 
randomly drawn codeword, '^i^^'^i) — tT'D} is of 

the exponential order of e~"^9^^\ then the random selection 
of a codebook of size e"[-^«(-'^^+'l constitutes e"[^'''^^+'^l 
independent trials of an experiment whose probability of 
success is of the exponential order of e^"^«(^^. Using standard 
random coding arguments, the probability that at least one 
codeword, in that codebook, would fall within distance nD 
from the given typical x becomes overwhelmingly large as 
n — > cx). Since this randomly selected codebook satisfies also 
■k Y^i=i P""!^* = y} ^ l{y) in probabiUty (as n cx)) for 
all y £ 3^ (by the weak law of large numbers), then Iq{D) 
is an achievable rate within the class of codes that satisfy 

\Y^^=lMy^^y)^<l{y) for all 

Thus, Iq{D) > Rq{D), which together with the reversed 
inequality proved above, yields the equality Iq{D) = Rq{D). 

*To see why this is true, consider the functions y,k & y (each of 

which is defined as equal one for y = k and zero otherwise) as |y| distortion 
measures, indexed by k £ y, and consider the rate-distortion function w.rt. 
the usual distortion consfi'aint and the \y\ additional "distortion constraints" 
^{^k{Y)} 5: <l{t^) foi' all k G y, which, when satisfied, they all must be 
achieved with equality (since they must sum to unity). The rate-distortion 
function w.rt. these \y\ + 1 constraints, which is exactly Rq{D), is easily 
shown (using the standard method) to be jointly convex in D and q. 



Consequently, according to eq. dTSl ), we have established the 
relatioqfl 



Rq{D) = — min 



sD+ ^p(a;)lnZ,(s) 



xex 



(76) 



As this minimization problem is a convex problem {h\Zx{s) 
is convex in s), the minimizing s for a given D is obtained 
by taking the derivative of the r.h.s., which leads to 

d\nZx{s) 



D 



xex 



ds 



xex 



p{x) 



(77) 



This equation yields the distortion level D for a given value 
of the minimizing s in eq. (f76] l. Let us then denote 



xex 



p{x) 



This notation obviously means that 

Rq{D,) = -sDs-Y,p{x) \nZx{s). 
xex 

Taking the derivative of dTSb , we readily obtain 



(78) 



(79) 



dDs / \ ^ 

xex 

xex 



'Eyeyliy)dix,y)e- 



sd{xy, 



Eyeyliyy'"'"-"''''^ 
j:yeyliy)dHx,y)e~'''^^'y'> 
Eyeyl(y)^'"''^"'^^ 

sd{x,y) \ 



f J2yeyiiy)(^i^'yy 

V Eyey^iy)^'"''^"'''^ 

= -^p(x)-Var,Ka:,r)|X = x} 

xex 

= -mmse,(A|X), (80) 

where 'Vars{d{x,Y)\X ~ x} is the variance of d{x,Y) w.rt. 
the conditional pmf {ws{y\x)}. The last line follows from 
the fact the expectation of 'Vais{d{X,Y)\X} w.rt. {p{x)} 
is exactly the MMSE of d{X, Y) based on X. The integral 
forms of this equation are then precisely as in part (a) of the 
theorem with the corresponding integration constants. Finally, 
differentiating both sides of eq. (|79] l, we get 



dRjPs) 
ds 



^ -'-^ 
dD, 

= — s • 

ds 

dPs 
ds 

— s ■ mmses(A|X), 



E dlnZxis) 
p{x) 

xex 



ds 



+ D, 



(81) 



which when integrated back, yields part (b) of the theorem. 
This completes the proof of Theorem [T] 

5 Eq. j76) appears also in |5 p. 90, Corollary 4.2.3], with a completely 
different proof, for the special case where q minimizes both sides of the 
equation (and hence it refers to R(D)). However, the extension of that proof to 
a generic q is not apparent to be straightforward because here the minimization 
over the channels is liinited by the reproduction distribution constraint. 
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